Goto

Collaborating Authors

 riemannian approach


Riemannian approach to batch normalization

Neural Information Processing Systems

Batch normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets.


Reviews: Riemannian approach to batch normalization

Neural Information Processing Systems

Paper Summary Starting from the observation that batch-normalization induces a particular form of scale invariance on the weight matrix, the authors propose instead to directly learn the weights on the unit-sphere. This is motivated from information geometry as an example of optimization on a Riemannian manifold, in particular the Stiefel manifold V(1,n) which contains unit-length vectors. As the descent direction on the unit sphere is well known (eq 7), the main contribution of the paper is in extending popular optimization algorithms (SGD momentum and Adam) to constrained optimization on the unit-sphere. Furthermore, the authors propose orthogonality as a (principled) replacement for L2 regularization, which is no longer meaningful with norm constraints. The method is shown to be effective across two families of models (VGG, wide resnet) on CIFAR-10, CIFAR-100 and SVHN.


Riemannian approach to batch normalization

Cho, Minhyung, Lee, Jaehyung

Neural Information Processing Systems

Batch normalization (BN) has proven to be an effective algorithm for deep neural network training by normalizing the input to each neuron and reducing the internal covariate shift. The space of weight vectors in the BN layer can be naturally interpreted as a Riemannian manifold, which is invariant to linear scaling of weights. Following the intrinsic geometry of this manifold provides a new learning rule that is more efficient and easier to analyze. We also propose intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the manifold. The resulting algorithm consistently outperforms the original BN on various types of network architectures and datasets.